Skylens: Visual Analysis of Skyline on Multi-dimensional Data

Theoretical Background

Users often need to make decisions based on multi-dimensional criterias. However, there may be a vast number of available options, which makes it very hard for the users to make the best possible decision. Skyline queries aim to address this problem by reducing the number of available options presented to the user. This reduction is done by calculating the skyline of the data set and only presenting the points contained in this skyline (i.e. the skyline points) to the user. A point of the data set is a skyline point if and only if it is not dominated by any other point of the data set. A point p dominates another point q if and only if p is at least as good as q in all dimensions and p is better than q in at least one dimension. Therefore, all points that are not part of the skyline of the data set can be dismissed savely because every time a user makes the decision to choose any of these points a better option exists in the skyline of the data set.

The skyline query is already useful by reducing the amount of data and presenting only a subset of superior points to the user. However, users still need to manually interpret and compare the extracted skyline points before making a choice based on their preferences. Therefore, to make the best possible decision, a deep understanding of the skyline is necessary. This can quickly get difficult as the skyline grows, due to a lot of dimensions in the input data. Zhao et al. [1] presented a visual analytic tool called Skylens that allows the user to interactively explore the skyline of a data set. Skylens organizes the data and aims to help with interpretation and comparison of multi-dimensional data. The tool has a tabular design, consisting of charts and tables, to summarize attribute-wise rankings and differences between skyline points. This design helps users to inspect points and get a better understanding of reasons why a point is superior to others. This is important because, one point may not be superior in every single attribute, but a user may only prefer a few certain attributes where this point cannot be beaten. In contrast, this information is hidden in skyline queries, because they only present superior points without showing why the points are superior to others. User preference is another important point when designing an application to present relevant data for exploration and comparison. When comparing superior points, it is hard to distribute preference representing weights to attributes, because user preference tends to be unstable and inconsistent. Skylens tries to account for this by providing superior points from different perspectives and scales. The user is able to look at them at a wider view, to discover bigger clusters and outliers in the entire skyline and at a much smaller level, to explore differences between points in a small set of skyline points. With these features, Skylens gives users a set of different views when exploring and comparing data and aims to assist the user to make the best possible decision.

Implementation

Our application is fully written in HTML/CSS/JS and does not use any serverside technologies. The backbone of our application is d3.js, which allows us to create complex visualizations as SVGs. Our application reads in plain csv-files and processes the data after loading. Since everything is processed in the browser, our application is feasable for loading huge datasets, since the initial loading and processing time grows exponentially. For the general layout of the application is used Bootstrap and our own CSS code. To compute the embedding of the skyline points for the projection view, we used tSNEJS. Since we required some processing of vectors we additionally used victor.js to simplify this task.

Application

Our application mainly consists of four different regions: The projection view, the comparison view, the attribute table and the tabular view.

Screenshot of the complete application

Projection view

The projection view is an embedding of multi-dimensional data into a 2D plane using the t-distributed stochastic neighbor embedding (tsne) method. Tsne is no dimensionality reduction method like PCA for example, however it allows the users to quickly identify certain patterns (e.g. clusters) in the skyline of the data.

Each skyline point is respresented by a skyline glyph. The center of the skyline glyph encodes the domination score of the skyline point, i.e. how many other points in the dataset it dominates (red = high domination score, white = low domination score). The outer circle segments represent the attribute values of the skyline points, whereas the radius of the circle corresponds to the value of the attribute (larger value => larger radius).

Projection view

Comparison view

The comparison view allows the user to closely compare up to four selected skyline points. Each selected skyline point is represented by a glyph. The domination score of a point is represented by the radius of the dashed circle. Radial lines from the center of the glyphs represent each attribute and a line that goes once around the whole glyph represents the attribute values of the skyline point (where it meets the radial attribute lines). Additionally, each attribute value is encoded with a circle, which's radius corresponds to the relative ranking of this point's attribute values compared to all other point's attribute values. When hovering over a glyph in the comparison view, the glyph is drawn enlarged in an overlay view, in which additionally distribution of all attributes is encoded.

Additionally, the comparison view contains domination glyphs between the skyline point glyphs that encode the relation between the domination scores of the connected skyline points. The inner pie chart shows a comparison between the domination scores of the connected skyline points and the outer pie chart shows, how many of the dominated points are exclusively dominated by the corresponding skyline point. By hovering over a domination glyph, the overlay shows a comparison between the attribute values of the skyline points connected to the domination glyph.

Comparison view
Comparison view overlay

Attribute table

The attribute view shows a summary of all attributes of the loaded data. This summary contains the name of the attributes as well as their type and - in case of numeric data - the range of the attribute values. Since our application only works with numeric data, the nominal data columns are not considered for any calculations and are mostly listed for completeness.

Attribute table

Tabular view

The tabular view lists dataset points and gives users a detailed view about individual points, as well as how good they are compared to others. By default it only shows skyline points, but this can be switched with the selection on top of the table. The case-insensitive search box to the right, lets users highlight certain points according to a search criteria. Typing in i.e. 'james' highlights all rows that contain the name 'james' in red and the table scrolls the first matched result into view. Partial names like 'mes' are also possible to find the player. The table also supports advanced filtering such as greater ('>'), smaller('<') and equals ('=') to search for specific attributes. Searching for i.e. 'gp > 20' finds all points where the value of the attribute 'GP' is greater than 20.

The header of the table displays the value distribution per attribute with vertical lines showing were each point of the dataset lies in an attribute. The respective vertical line is highlighted in red when the user is hovering over a row in the table. This helps users to quickly identify were a certain point's place in the entire dataset is. The body of the table shows the actual points as rows, with their attributes as column. Each table cell shows a diverging bar chart, where the number of bars is equal to the number of points in the entire dataset. Therefore each bar corresponds to a point in the dataset. The bars are sorted ascending according to there values dimension. A purple bar highlights the position of a certain point's attribute in a row to give the user an idea of how well a points performance is. The height of the blue bars represents the summarization of a certain point's differences in all other dimensions. Bars above the dashed middle line show positive differences, while bars below show negative differences. Hovering over on any of the diverging bar charts also highlights the point in the projection view and shows it's value. Clicking on any of the charts adds the point to the comparison view, while clicking on the first or second column of a row expands the row and reveals a more detailed view of a certain point below the diverging bar chart (see player 'James Harden' in screenshot).

Tabular view

The columns in the matrix below the diverging bar chart of an attribute are aligned with the bars in the diverging bar chart. The rows of the matrix represent the attributes and have the same order as the columns in the table. The bars in each row are sorted according to the diverging bar chart above, so each bar in a matrix column corresponds to the same point. Colors are used to show the difference between a certain point and all the others, i.e. blue bars have higher values than a point marked in purple and red bars have lower values. To the left of the matrix are purple bars to indicate the decisive subspaces (smallest combination of attributes) were a point is in the skyline. Each row represents a dimension, thus if a point is in a subspace skyline, a purple bar is placed in a vertical line. The more subspace skylines a point is in, the stronger it is in different subspaces. Therefore, a point with a lot of decisive subspaces is usually preferred. The algorithm to determine the subspaces is based on the approach of Pei et al. [2].

Links

Our work

Original author's work

External libraries

Datasets

References

  1. X. Zhao, et al.,"SkyLens: Visual Analysis of Skyline on Multi-Dimensional Data" in IEEE Transactions on Visualization & Computer Graphics, vol. 24, no. 01, pp. 246-255, 2018.
  2. J. Pei, W. Jin, M. Ester, and Y. Tao., "Catching the best views of skyline: A semantic approach based on decisive subspaces". In Proceedings of the 31st International Conference on Very Large Data Bases, pages 253–264. VLDB Endowment, 2005